CPS222 Lecture: Linked Lists Last revised 1/31/15 Objectives: 1. To show how to implement linked lists in C++ 2. To introduce linked list variants: circular lists, doubly-linked lists, use of a header node. Materials: 1. Handout of StudentList interface and simple list implementation, plus executable code. 2. Online versions of various student list implementations to project: (w/header) studentlisth.cc (circular w/header) studentlisthc.cc (doubly linked) studentlistd.cc 3. Handouts developing differences between each version and preceeding one 4. Handout comparing C++ list template with Java List collection I. Introduction - ------------ A. We will now look at how to implement linked lists. We will use C++ for our examples, but the strategy is easily adapted to any language that supports pointers (including Java, whose references really behave like C++ pointers). B. One issue we face in implementing linked lists is that the general concept is very generic. In particular, whenever we use a list we typically utilize some specific ordering rule to control were new elements are added to the list. (E.g. in a first-come first-served situation, new elements are always added at the end.) 1. To make our examples concrete - and because this particular example allows us to introduce a variety of concepts, we will consider the case of a list of students, maintained in "priority" order by academic year - i.e. All seniors will appear before all juniors, who will appear before all sophomores ... a. A newly added student will appear after all students of the same or higher class, but before all students of lower class - e.g. a newly added junior will come after all seniors and juniors, but before all sophomores. b. We will restrict ourselves to accessing just the first student in the list, and will only allow ourselves to remove the first student in the list. (Maybe this structure is modelling some sort of waiting line.) c. The elements in our list - our nodes - will contain: i. A person's name ii. A class year (4 = senior, 3 = junior ...) iii. A link that points to the next item in the list. (Often known as next) In addition, we will use a single "external pointer" to point to the first item in the list. EXAMPLE: Using these conventions, suppose we start with an initially empty list and insert items into it in the order: info year aardvark 3 buffalo 2 cat 4 dog 3 Then the list would look like this: __________ ____________ __________ __________ | cat | | aardvark | | dog | | buffalo| | 4 | | 3 | | 3 | | 2 | o------>| o-|----->| o-|--->| o-|----->| o-|--- ---------- ------------ ---------- ---------- | ----- --- iv. Note: - A single external pointer (not part of any node on the list) points to the first node. - Access to any node is by following a chain of one or more pointers. Thus, for example, to get to "dog", one would have to follow the external pointer to "cat", then cat's link to "aardvark", then aardvark's to dog. - The last node in the list contains a special pointer value that indicates that there are no more nodes - a null pointer. d. A comment is in order on the value used to mark the end of the list: i. In Java, there is a reserved word "null" which is a reference to nothing, and can be used for this purpose. ii. In C++, a pointer with a value of 0 serves the same purpose. Conventionally, we refer to such a value by the name NULL (all caps) - but this is not a reserved word in C++, as it is in Java. Instead, some of the standard system header files define NULL appropriately, using #define. If you use NULL in a program and get complaints about NULL being undefined, you can include the following line (but don't do it unless you need to) #define NULL 0 2. We will define the following operations on this structure: a. constructor b. bool isEmpty() - accessor - test to see whether list is empty c. void makeEmpty() - mutator - clear out contents d. void insert(string name, int year) - mutator e. string getFirst() - accessor - returns name of first student f. void removeFirst() - mutator - removes first student g. void remove(string name) - mutator - removes a specific student h. In addition, for demonstration/testing purposes, we will include an accessor called print to print out all the nodes on the list in order to cout. i. Proper management of memory will also require us to write some additional methods, which we will discuss after considering the major methods listed above. i. A destructor to delete all the nodes when the list itself is destroyed - lest we have a "storage leak". ii. A copy constructor - to ensure that we make copies of all the individual nodes when we copy the list iii. An assignment operator - for similar reasons II. Implementing Basic Linked Lists in C++ -- ------------ ----- ------ ----- -- --- A. We can now develop the code needed for our example list B. studentlist.h HANDOUT page 1 1. Walk through the prototypes - note that we will discuss the copy constructor, operator =, and destructor later. 2. Note INCOMPLETE declaration of local class Node (needed to make _first field declaration possible.) 3. _first is our EXTERNAL POINTER to the list. Note that the first node on the list is special - it is pointed to by the external pointer, whereas all the other nodes are pointed to by the preceeding node. C. studentlist.cc HANDOUT pages 2-4 Walk through the local class and methods, skipping the last three 1. Local class Node. a. Note how its methods are declared and implemented in the same place - appropriate since the details of the class are only needed here. b. The methods of class StudentList need to manipulate the fields of class Node. Note how this is made possible by declaring its fields to be be public, since it is private in class StudentList. An alternate approach would be to make the fields of Node private, and include the following in class Node: friend class StudentList; c. Note code to report when a node is being destroyed - just for our demonstration purposes. 2. Constructor - makes external pointer NULL Time complexity of this operation? ASK O(1) 3. isEmpty() - tests to see whether external pointer is NULL Time complexity of this operation? ASK O(1) 4. makeEmpty() - walks through the list, deleting nodes. Note need for two variables d and p - cannot access the link of a node (reliably) after it has been deleted. i. Trace for a two node list ii. Time complexity of this operation? ASK O(n) 5. insert(name, year) a. Basic process for insertion into ANY linked structure is Get a node Load it up Link it in b. new Node(name, year) does first two c. Before linking it in, we need to determine where. We can't know this until we find the node that belongs AFTER it (one with class year less than the new student.) We need to know the node that needs to go BEFORE it in order to link it in correctly. d. Note two conditions on loop - we stop if we run off the end of the list (in which case the new node goes at the very end) or we find a node with year < new student. e. We call the technique of using two pointers like this using a LEADING pointer and a TRAILING pointer. f. Linking code - the new node must point to its successor (or NULL if we ran off the end of the list.), and its predecessor must point to it. q == NULL implies that the new node goes at the start of the list, so we reset the external pointer instead. g. Trace process of building up a list, from empty, as follows: aardvark 3 buffalo 2 cat 4 dog 3 h. Time complexity of this operation? ASK O(n) 6. getFirst() a. What will happen if _first == NULL (empty list)? ASK Note precondition in .h file b. Time complexity of this operation? ASK O(1) 7. removeFirst() a. Will also crash if _first == NULL b. Note that we need to explicitly recycle the node by using delete - else it is lost for the remainder of the run of the program! c. Time complexity of this operation? ASK O(1) 8. remove(name) a. First we search for the node. We need to have a pointer to the node BEFORE it as well, since we must modify the pointer in that node. Again, we use leading and trailing pointers. b. The while loop terminates when we either find the node or run off the end of the list. c. To unlink the node from the list, we reset the pointer of the node BEFORE it - or the external pointer if it is the first node. d. Walk through deleting dog, then cat from example used for insert. e. Note, again, that we need to explicitly recycle the node by using delete - else it is lost for the remainder of the run of the program! f. Time complexity of this operation? ASK O(n) 9. print() a. Involves a loop in which we traverse the list - visiting every node. b. Time complexity of this operation? ASK O(n) D. Demonstrate class linked with test driver. E. The code for studentlist includes three methods that are necessary because of the way C++ does storage allocation. 1. We begin with the destructor - the last operation listed. a. In C++, a class may have an explicit destructor. The signature of the destructor is always ~ClassName - with no return value and no parameters. If the programmer does not supply a destructor, the compiler creates a "Miranda rule" destructor that does nothing. b. The destructor for an object is called in one of the following ways. i. If the object is declared as an ordinary (non pointer, non reference variable), then the destructor is automatically called when the variable goes out of scope. - Termination of the program for a static variable. - Exit from a block for an automatic variable. - When it is no longer needed for a temporary variable created by the compiler. ii. If the object is declared as a heap variable accessed by a pointer, the programmer must invoke the delete operation on the pointer. iii. It is also possible to call a destructor explicitly (like other methods), but there is a special syntax used, and this is rarely needed. c. A programmer-written destructor is needed when the object "owns" resources that must be freed when it is destroyed. Here, we require that all the nodes on the list be freed up when the external pointer to the list is destroyed, lest the storage allocated to them be lost for the duration of the program. Go over destructor in program. Here, the destructor uses the makeEmpty() method to delete all the nodes in the list. 2. When a class has a destructor, it often needs two other methods to prevent resources from being released prematurely. a. To see why, consider the following scenario, based on our StudentList class (but for now without the copy constructor and assignment operator). StudentList s; s.insert("aardvark", 3); s.insert("buffalo", 2); foo(s); bar(); where foo() looks like this: void foo(StudentList x) { ... } and bar() looks like this: void bar() { StudentList y; y = s; } i. Note that the list is passed to foo by value. When we enter foo, we have the following scenario, because x is initialized to be a copy of s. ------------ ------------ ------------ Top level | _first o-|-----> | aardvark | --> | buffalo | variable s ------------ --> | 3 | / | 2 | / | o-----|-- | NULL | ------------ / ------------ ------------ Parameter x | _first o-|- ------------ That is, both s and x refer to the same list of nodes. We say that x is a SHALLOW COPY of s. ii. Now what happens when foo exits? The destructor for x is called, since x is local to foo. This results in the nodes on the list being recycled - destroying the list pointed to by s! (Actually, the _first pointer in s refers to a recycled node, which can lead to almost anything going wrong.) iii. A similar situation happens in bar. The assignment statement causes a y to become a shallow copy of s. When bar exits, y is destroyed and the nodes on the list are recycled - again! b. To prevent these problems from arising, we must ensure that whenever a list is copied, we make a DEEP COPY that copies all the nodes, not just the external pointer. That requires us to implement two methods. i. A copy constructor - constructor with signature StudentList(const StudentList &) - The compiler uses this whenever it must copy an object - e.g. when a parameter is passed by value, or a function result is returned by value. - If the programmer doesn't write one, the compiler creates a "Miranda rule" copy constructor that simply makes a bit by bit copy of the object (i.e. a shallow copy if the object contains any pointers.) ii. Overload of the assignment operator - method with signature StudentList & operator = (const StudentList &) - The compiler uses this whenever assignment is done using =. - If the programmer doesn't write one, the compiler creates a "Miranda rule" assignment operator that simply makes a bit by bit copy of the object (i.e. a shallow copy if the object contains any pointers.) c. Go over code for copy constructor and assignment operator in example. i. Note how copy constructor uses assignment operator, to avoid having to write code twice. ii. Note return from assignment operator - needed to allow chaining of assignments: a = b = c; III. Variants of the basic linked list --- -------- -- --- ----- ------ ---- A. Use of a header node. 1. In algorithms involving linked lists, there are certain crucial points one must bear in mind: a. To insert an item into a linked structure, it is necessary to modify the link field of its predecessor. This means that to insert an item, we must traverse the list from its beginning until we reach the place we want. Often this is done with two pointers - a leading pointer and a trailing pointer. When the leading pointer hits the item that is to be the SUCCESSOR of our new item, the trailing pointer is on its predecessor. We did this above. b. An exception to the above rule occurs when the item we are inserting is the first in the list. Then we modify the EXTERNAL POINTER to the structure, since the item has no predecessor. This was included as a special case in our insert method above. c. A similar principle holds with deleting an item from a linked list. We must modify the pointer field in its predecessor to point to its successor or, if it is the first item in the list, we must modify the external pointer. Again, this was included as a special case in our remove(name) method above. (It was the ONLY case in our removeFirst() method.) 2. These points imply that many list algorithms will have the following structure: Traverse the list, using leading and trailing pointers, until you have found the proper place - being sure not to run off the end of the list. if leading pointer is on the first item in the list then modify the external pointer to the list to effect the structural change (i.e. to point to the newly inserted node or to jump around the deleted node.) else modify the link field of the node pointed to by the trailing pointer to effect the structural change, 3. The fact that modifications at the front of the list are a special case leads to problems in general purpose algorithms. Often, it is desirable to eliminate this special case. One method for doing so is by the use of a HEADER NODE: a. When an "empty" list is created, it is actually created to contain one special node called a header. This node is not functionally a part of the list as far as a user of the list is concerned; but it simplifies the algorithms since the first useful item on the list is actually the successor of the header and thus requires no special cases when accessing. (All nodes that are officially part of the list have a predecessor that's part of the list.) b. Example: redraw list containing cat 4, aardvark 3, dog 3, buffalo 2 with a header. c. Note that we don't actually make any use of the values stored in this node (_name and _year) - just its link. (We'll discuss a way to make use of one of the values later.) 4. Let's consider how our code would be modified to use a header. a. Project studentlisth.cc - down to first #define. i. In the list without a header, the external pointer went to the first node. Here it goes to the header, which in turn goes to the first node. ii. For clarity, we will change the name of the private field from _first to _header. Rather than creating a new .h file, the code here uses the preprocessor to do the job for us! b. Any changes needed to Node class? ASK NO c. Any changes needed to constructor? ASK Discuss code projected versus handout d. Any changes needed to isEmpty()? ASK Discuss code projected versus handout e. Any changes needed to makeEmpty()? ASK Discuss code projected versus handout - note that we don't recycle the header, just the nodes containing data f. Any changes needed to insert()? ASK Discuss code projected versus handout g. Any changes needed to getFirst()? ASK Discuss code projected versus handout h. Any changes needed to removeFirst()? ASK Discuss code projected versus handout i. Any changes needed to print()? ASK Discuss code projected versus handout j. Any changes needed to copy constructor? ASK Discuss code projected versus handout k. Any changes needed to operator = ? ASK Discuss code projected versus handout l. Any changes needed to destructor? ASK Discuss code projected versus handout HANDOUT with changes B. Another special case occurs when we reach the end of a linked list. The end is normally marked by having the last item in the list have a link value that does not represent a legal pointer to a node - i.e. NULL. 1. We must be terribly careful we do not use this as if it were a legal pointer - e.g. the while loops in insert and remove test for p != NULL before they examine p -> _year (insert) or p -> _name (remove). C++ guarantees that if two conditions are connected by &&, and the first proves to be false, the second won't even be tested. (False and anything is false.). 2. One way to avoid having to be careful to always check for this special case would be with the use of a TRAILER node. a. The trailer node would contain a year LESS than any possible value - e.g. 0. The while loop would then be: while (p -> _year >= year) and this would be guaranteed to exit when p is on the trailer. b. The initial condition of an empty list would be two nodes: an external pointer pointing to a header pointing to a trailer. c. However, this approach as such is not often used, because of the need to create two special nodes just to make an empty list. 3. An alternate approach to achieving nearly the same effect is the use of CIRCULAR LINKING. a. In this approach the last actual node, instead of containing a NULL pointer, points back to the first node on the list. If a header is used (as it often is with circular linking), then the "first node" is the header node, of course. Thus, a circular list with a header looks like this: Empty list: +--> [ Header ]--+ -- header points to itself | | +----------------+ List w/2 real +--> [ Header ]--> [ ]--> [ ]--+ elements: | | +--------------------------------+ b. Recall that, in the example we have developed thus far, we didn't make any use of the _name or _year fields of the header. Now what we will do is store in the header a year that is SMALLER than any possible year - e.g. 0 (If the list were in increasing order, we would store a larger value than any possible legal value, of course.) Thus, in effect, the same node serves as BOTH a header and a trailer. 4. Let's consider how our code would be modified to use circular linking along with a header. a. Project studentlisthc.cc - down to just before class Node. We will now consider changes in this code relative to the version with a header we just considered - i.e. cumulatively from our original code. (Circular lists don't always have headers, but it often makes sense to do so, as in this case.) b. Any changes needed to Node class? ASK NO c. Any changes needed to constructor? ASK Discuss code projected versus handout d. Any changes needed to isEmpty()? ASK Discuss code projected versus handout e. Any changes needed to makeEmpty()? ASK Discuss code projected versus handout - note that we don't recycle the header, just the nodes containing data f. Any changes needed to insert()? ASK Discuss code projected versus handout g. Any changes needed to getFirst()? ASK Discuss code projected versus handout h. Any changes needed to removeFirst()? ASK Discuss code projected versus handout i. Any changes needed to print()? ASK Discuss code projected versus handout j. Any changes needed to copy constructor? ASK Discuss code projected versus handout k. Any changes needed to operator = ? ASK Discuss code projected versus handout l. Any changes needed to destructor? ASK Discuss code projected versus handout HANDOUT with changes C. One more modification: Double Linking 1. Recall that linked lists, as we have developed them thus far, are like one way streets. a. You can get from a node to its successor by following one link. b. The only way to get to the predecessor of a node is to start at the beginning of the list, using leading and trailing pointers, until the leading pointer hits the node you want. 2. Sometimes, it is desired to be able to go both forward AND backward from a given node in a list. If this is the case, we can use a DOUBLY-LINKED list, in which each node contains two pointers: one to its successor and one to its predecessor. [ ] --> [ ] --> [ ] [ ] <-- [ ] <-- [ ] a. We will refer to these pointers as the forward link and the backward link - or next and prev for short. b. Doubly-linked lists need not use a header node, but often do (because otherwise the special cases become a real problem.) If the list does have a header, then: i. The next of the header points to the first real item. ii. The prev of the first real item points to the header. iii. The prev of the header MAY be used to point to the last item, if this is useful (and it often is.) c. Indeed, when a header is used, it is often expedient to also use circular linking. i. Having the prev of the header point to the last item is part of this. ii. Likewise, the next of the last item would point to the header. iii. QUESTION: What would an empty doubly-linked circular list with a header look like? -- a doubly-narcissistic header node! d. Redraw original example doubly-linked circular with header. 3. Let's consider how our code would be modified to use double linking, along with circular linking and a header. a. Project studentlistd.cc - down to just before class Node. We will now consider changes in this code relative to the circular version with a header we just considered - i.e. continuing to be cumulative from our original code. b. Any changes needed to Node class? ASK NO c. Any changes needed to constructor? ASK Discuss code projected versus handout d. Any changes needed to isEmpty()? ASK Discuss code projected versus handout e. Any changes needed to makeEmpty()? ASK Discuss code projected versus handout - note that we don't recycle the header, just the nodes containing data f. Any changes needed to insert()? ASK Discuss code projected versus handout g. Any changes needed to getFirst()? ASK Discuss code projected versus handout h. Any changes needed to removeFirst()? ASK Discuss code projected versus handout i. Any changes needed to print()? ASK Discuss code projected versus handout j. Any changes needed to copy constructor? ASK Discuss code projected versus handout k. Any changes needed to operator = ? ASK Discuss code projected versus handout l. Any changes needed to destructor? ASK Discuss code projected versus handout HANDOUT with changes IV. Lists in the C++ Standard Template Library (STL) -- ----- -- --- --- -------- -------- ------- ----- A. The Standard Template Library (STL) includes a list template, which actually uses a linked list, but doesn't require the user to deal directly with pointers. B. HANDOUT comparing the C++ list template and the Java List collection Some points to note: 1. The list template is made available by #include 2. The template is instantiated for a particular type of list element a. This is accomplished by typedef list < Student * > WaitingList (near bottom of page 1) b. A field of this type is then created by WaitingList waiting; c. These two declarations could have been combined into one declaration list < Student * > waiting; - However, this would complicate the syntax needed in the implementation file, and use of typedef is normally preferred. d. We use Student * as the type for the list elements, rather than Student, because we only want one object per student, perhaps with multiple references to it. (Recall that the C++ pointer is closest in meaning to the Java reference.) 3. The Java list interface has two distinct implementations: ArrayList and LinkedList. The C++ list template is always implemented by a linked list - a doubly linked one, at that! 4. The C++ list template does NOT include a method for checking to see if a particular element occurs in the list (analogous to the Java List contains() method). In the C++ version of isWaiting(), then, we have to do things the hard way by going through the list one element at a time and checking for a match. We can return true as soon as we find a match. We return false if we complete the loop without finding a match. 5. Both the C++ STL and the Java collections facility support iterators for systematically visiting the elements of a collection - but the syntax is quite different. a. For each instantiation of a C++ STL container template, there is are corresponding iterator types. In this example, when we instantiated list for Student * using the typedef name WaitingList, we also automatically created four types of iterators: WaitingList::iterator WaitingList::const_iterator WaitingList::reverse_iterator WaitingList::const_reverse_iterator i. The const forms of the iterator do not permit modification of the elements of the container through them - and must be used if we create an iterator in a const method. ii. The reverse forms of the iterator go through the container backwards! (Not all container types support reverse iteration, but list does.) b. A C++ container has at least two methods that create iterators - begin() returns an iterator that references the first element in the collection, and end() refers one past the end of the collection. (Containers that support reverse iterators, as list does, also have methods rbegin() and rend().) c. An iterator supports the following operations: i. == - compare two iterators. Two iterators are equal just when they refer to the same list element. Of course, != is defined as ! ( == ). The comparsion iter != end() has similar meaning to the Java iterator method hasNext(); ii. * - dereference the iterator to get at the element it currently refers to - similar to one of the functions of Java's next() method. iii. ++ - advance the iterator to the next element - similar to the other function of Java's next() method. d. Note carefully the C++ code that uses iterators in isWaiting() and printReport(). This code is a "C++ idiom".